AITopics | data preprocessing

Collaborating Authors

data preprocessing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep Learning, Machine Learning, Advancing Big Data Analytics and Management

Hsieh, Weiche, Bi, Ziqian, Chen, Keyu, Peng, Benji, Zhang, Sen, Xu, Jiawei, Wang, Jinlang, Yin, Caitlyn Heqi, Zhang, Yichao, Feng, Pohsun, Wen, Yizhu, Wang, Tianyang, Li, Ming, Liang, Chia Xin, Ren, Jintao, Niu, Qian, Chen, Silin, Yan, Lawrence K. Q., Xu, Han, Tseng, Hong-Ming, Song, Xinyuan, Jing, Bowen, Yang, Junjie, Song, Junhao, Liu, Junyu, Liu, Ming

arXiv.org Artificial IntelligenceDec-3-2024

Advancements in artificial intelligence, machine learning, and deep learning have catalyzed the transformation of big data analytics and management into pivotal domains for research and application. This work explores the theoretical foundations, methodological advancements, and practical implementations of these technologies, emphasizing their role in uncovering actionable insights from massive, high-dimensional datasets. The study presents a systematic overview of data preprocessing techniques, including data cleaning, normalization, integration, and dimensionality reduction, to prepare raw data for analysis. Core analytics methodologies such as classification, clustering, regression, and anomaly detection are examined, with a focus on algorithmic innovation and scalability. Furthermore, the text delves into state-of-the-art frameworks for data mining and predictive modeling, highlighting the role of neural networks, support vector machines, and ensemble methods in tackling complex analytical challenges. Special emphasis is placed on the convergence of big data with distributed computing paradigms, including cloud and edge computing, to address challenges in storage, computation, and real-time analytics. The integration of ethical considerations, including data privacy and compliance with global standards, ensures a holistic perspective on data management. Practical applications across healthcare, finance, marketing, and policy-making illustrate the real-world impact of these technologies. Through comprehensive case studies and Python-based implementations, this work equips researchers, practitioners, and data enthusiasts with the tools to navigate the complexities of modern data analytics. It bridges the gap between theory and practice, fostering the development of innovative solutions for managing and leveraging data in the era of artificial intelligence.

data mining, information retrieval, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2412.02187

Country:

Europe (0.67)
Asia > China (0.45)
North America > United States > Wisconsin (0.14)
(2 more...)

Genre:

Workflow (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Transportation (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(10 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
(4 more...)

Add feedback

The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities

Parthasarathy, Venkatesh Balavadhani, Zafar, Ahtsham, Khan, Aafaq, Shahid, Arsalan

arXiv.org Artificial IntelligenceAug-23-2024

This report examines the fine-tuning of Large Language Models (LLMs), integrating theoretical insights with practical applications. It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to their pivotal role in AI. A comparison of fine-tuning methodologies, including supervised, unsupervised, and instruction-based approaches, highlights their applicability to different tasks. The report introduces a structured seven-stage pipeline for fine-tuning LLMs, spanning data preparation, model initialization, hyperparameter tuning, and model deployment. Emphasis is placed on managing imbalanced datasets and optimization techniques. Parameter-efficient methods like Low-Rank Adaptation (LoRA) and Half Fine-Tuning are explored for balancing computational efficiency with performance. Advanced techniques such as memory fine-tuning, Mixture of Experts (MoE), and Mixture of Agents (MoA) are discussed for leveraging specialized networks and multi-agent collaboration. The report also examines novel approaches like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), which align LLMs with human preferences, alongside pruning and routing optimizations to improve efficiency. Further sections cover validation frameworks, post-deployment monitoring, and inference optimization, with attention to deploying LLMs on distributed and cloud-based platforms. Emerging areas such as multimodal LLMs, fine-tuning for audio and speech, and challenges related to scalability, privacy, and accountability are also addressed. This report offers actionable insights for researchers and practitioners navigating LLM fine-tuning in an evolving landscape.

applied research challenge and opportunity, computational requirement, task-specific dataset, (13 more...)

arXiv.org Artificial Intelligence

2408.13296

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)
Research Report > Promising Solution (0.87)

Industry:

Law (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Big Data - Supply Chain Management Framework for Forecasting: Data Preprocessing and Machine Learning Techniques

Jahin, Md Abrar, Shovon, Md Sakib Hossain, Shin, Jungpil, Ridoy, Istiyaque Ahmed, Tomioka, Yoichi, Mridha, M. F.

arXiv.org Machine LearningDec-14-2023

This article intends to systematically identify and comparatively analyze state-of-the-art supply chain (SC) forecasting strategies and technologies. A novel framework has been proposed incorporating Big Data Analytics in SC Management (problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization), forecasting effects on human-workforce, inventory, and overall SC. Initially, the need to collect data according to SC strategy and how to collect them has been discussed. The article discusses the need for different types of forecasting according to the period or SC objective. The SC KPIs and the error-measurement systems have been recommended to optimize the top-performing model. The adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency have been illustrated. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning). The contribution of this research lies in the standard SC process framework proposal, recommended forecasting data analysis, forecasting effects on SC performance, machine learning algorithms optimization followed, and in shedding light on future research.

evolutionary algorithm, forecasting, machine learning, (20 more...)

arXiv.org Machine Learning

2307.12971

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
(8 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.92)
Research Report > Promising Solution (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(5 more...)

Add feedback

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

Gong, Ping, Ma, Yuxin, Li, Cheng, Ma, Xiaosong, Noh, Sam H.

arXiv.org Artificial IntelligenceApr-18-2023

In this paper, we primarily focus on understanding the data preprocessing pipeline for DNN Training in the public cloud. First, we run experiments to test the performance implications of the two major data preprocessing methods using either raw data or record files. The preliminary results show that data preprocessing is a clear bottleneck, even with the most efficient software and hardware configuration enabled by NVIDIA DALI, a high-optimized data preprocessing library. Second, we identify the potential causes, exercise a variety of optimization methods, and present their pros and cons. We hope this work will shed light on the new co-design of ``data storage, loading pipeline'' and ``training framework'' and flexible resource configurations between them so that the resources can be fully exploited and performance can be maximized.

artificial intelligence, cloud computing, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.08925

Country:

North America > United States > Washington > King County > Renton (0.04)
North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data Preprocessing in R for Data Science - Detechtor

#artificialintelligenceOct-25-2022, 20:32:25 GMT

In Data Science, Data Preprocessing is a very crucial part in the making of a Machine Learning model. Without it, our Machine learning models will not work properly. Think of it for example like preparing the farm to plant crops. Without proper preparation, we would have a difficult time planting and it would negatively affect the crops yield. This is probably going to be the most boring part of this course but once we are done with it we will have a smoother ride with the rest of the course.

data preprocessing, data science, dependent variable, (4 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Preprocessing with scikit-learn -- Missing Values

#artificialintelligenceOct-17-2022, 13:05:17 GMT

By popular demand from my previous article, in this tutorial I illustrate how to preprocess data using scikit-learn, a Python library for machine learning. Data preprocessing transforms data into a format which is more suitable for estimators. In my previous articles I illustrated how to deal with missing values, normalization, standardization, formatting and binning with Python pandas. In this tutorial I show you how to deal with mising values with scikit-learn. For the other preprocessing techniques in scikit-learn, I will write other posts.

dataset, scikit-learn library, tutorial, (14 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.36)
Information Technology > Communications > Social Media (0.32)

Add feedback

How to Avoid Data Leakage in Data Preprocessing

#artificialintelligenceAug-13-2022, 21:35:05 GMT

Avoid data leaking from the test set into the training set. “How to Avoid Data Leakage in Data Preprocessing” is published by Rukshan Pramoditha.

data leakage, data preprocessing

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Importance of Data Preprocessing for Machine Learning in the E-Commerce Industry

#artificialintelligenceJul-11-2022, 04:25:58 GMT

Big data, as the name suggests, are large volumes of data that contain a variety of data that travel in high velocity. Big data are bound to contain dirty data as it is collected from various sources that are raw or unprocessed. Data preprocessing is the process of transforming raw data to an understandable format which is ready for analytical uses. Machine Learning is an artificial intelligence subset and an analytical application that is used to make decisions without programming by receiving and analyzing data. E-commerce industry is the industry which revolves around the application of technology into commercial businesses.

data preprocessing, e-commerce industry, machine learning, (10 more...)

#artificialintelligence

Industry: Information Technology > Services > e-Commerce Services (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.70)

Add feedback

Data Preprocessing

#artificialintelligenceJun-5-2022, 11:00:23 GMT

While working on Machine Learning and related fields, we often come across huge datasets. In order for this data to be efficiently used by the model, a little bit of preprocessing is required that makes the data to be more structured. NOTE: There may be some additional steps included in between based on the complexity of the dataset, but the steps mentioned here are standard to almost any dataset.

data preprocessing, dataset

#artificialintelligence

Technology:

Information Technology > Data Science (0.52)
Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Data Preprocessing with Python Pandas -- Part 3 Normalisation

#artificialintelligenceMay-23-2022, 08:15:33 GMT

This tutorial explains how to preprocess data using the Pandas library. Preprocessing is the process of doing a pre-analysis of data, in order to transform them into a standard and normalised format. In this tutorial we deal only with normalisation. In my previous tutorials I dealt with missing values and data formatting. Data Normalisation involves adjusting values measured on different scales to a common scale.

dataset, preprocessing, tutorial, (13 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning (0.35)
Information Technology > Communications > Social Media (0.32)

Add feedback